repetitive pattern
- Asia > China > Beijing > Beijing (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern
Yang, Lekang, Liu, Yuetong, Zhang, Yitong, Li, Jia
Software development relies heavily on extensive unit testing, which makes the efficiency of automated Unit Test Generation (UTG) particularly important. However, most existing LLMs generate test cases one token at a time in each forward pass, which leads to inefficient UTG. Recently, diffusion LLMs (dLLMs) have emerged, offering promising parallel generation capabilities and showing strong potential for efficient UTG. Despite this advantage, their application to UTG is still constrained by a clear trade-off between efficiency and test quality, since increasing the number of tokens generated in each step often causes a sharp decline in the quality of test cases. To overcome this limitation, we present DiffTester, an acceleration framework specifically tailored for dLLMs in UTG. The key idea of DiffTester is that unit tests targeting the same focal method often share repetitive structural patterns. By dynamically identifying these common patterns through abstract syntax tree analysis during generation, DiffTester adaptively increases the number of tokens produced at each step without compromising the quality of the output. To enable comprehensive evaluation, we extend the original TestEval benchmark, which was limited to Python, by introducing additional programming languages including Java and C++. Extensive experiments on three benchmarks with two representative models show that DiffTester delivers significant acceleration while preserving test coverage. Moreover, DiffTester generalizes well across different dLLMs and programming languages, providing a practical and scalable solution for efficient UTG in software development. Code and data are publicly available at https://github.com/wellbeingyang/DLM4UTG-open .
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models
Chen, Longze, Liu, Ziqiang, He, Wanwei, Li, Yunshui, Luo, Run, Yang, Min
Long-context modeling capabilities are important for large language models (LLMs) in various applications. However, directly training LLMs with long context windows is insufficient to enhance this capability since some training samples do not exhibit strong semantic dependencies across long contexts. In this study, we propose a data mining framework \textbf{ProLong} that can assign each training sample with a long dependency score, which can be used to rank and filter samples that are more advantageous for enhancing long-context modeling abilities in LLM training. Specifically, we first use delta perplexity scores to measure the \textit{Dependency Strength} between text segments in a given document. Then we refine this metric based on the \textit{Dependency Distance} of these segments to incorporate spatial relationships across long-contexts. Final results are calibrated with a \textit{Dependency Specificity} metric to prevent trivial dependencies introduced by repetitive patterns. Moreover, a random sampling approach is proposed to optimize the computational efficiency of ProLong. Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies and LLMs trained on these documents exhibit significantly enhanced long-context modeling capabilities.
- Asia > Vietnam > Long An Province (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Analyzing Task-Encoding Tokens in Large Language Models
Bai, Yu, Huang, Heyan, Piano, Cesare Spinoso-Di, Rondeau, Marc-Antoine, Chen, Sanxing, Gao, Yang, Cheung, Jackie Chi Kit
In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. Past work has found that, during this process, representations of the last prompt token are utilized to store task reasoning procedures, thereby explaining the working mechanism of in-context learning. In this paper, we seek to locate and analyze other task-encoding tokens whose representations store task reasoning procedures. Supported by experiments that ablate the representations of different token types, we find that template and stopword tokens are the most prone to be task-encoding tokens. In addition, we demonstrate experimentally that lexical cues, repetition, and text formats are the main distinguishing characteristics of these tokens. Our work provides additional insights into how large language models (LLMs) leverage task reasoning procedures in ICL and suggests that future work may involve using task-encoding tokens to improve the computational efficiency of LLMs at inference time and their ability to handle long sequences.
- North America > United States > New York (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
What AI Can? What AI Cannot?
We all will collectively accept the truth that nowadays almost everyone is selling their software by saying that it contains AI and it is really very popular amongst everyone. It's like if you want to start a discussion about any IT topic, the hot topic you find there will be AI. And everyone loves to speak about it confidently and pour out their own self-imagined concepts, which are mostly nowhere near to reality, though it's good that people are taking interest in such topics. Also, I will try not to be too much technical and will explain most of the things in Layman's Terms. If you want a more detailed and technical article about AI you can check my other article.
Big data predictions: 8 analytics trends in 2020
In 2019, enterprise demands rose for real-time and near real-time analytics, and data continued to expand its role in everyday business operations and decision-making. Enterprises will continue to build on these trends in 2020, and that will drive analytics vendors to add new capabilities and expand their offerings. Here are eight key trends for analytics in 2020. In-memory costs are decreasing, and this will drive more analytics to real-time environments. The demand for real-time or near real-time analytics will require fast CPUs and in-memory processing.
Artificial intelligence segment heats up, here's all you need to know about it
The year 2015 proved to be crucial in the history of artificial intelligence (AI). It was the time when AI, which would so far fall in the category of sci-fi, went mainstream. The late John McCarthy, one of the founders of the discipline, foresaw the autonomous car as long back as the 60s. In April 2015, a car designed by Delphi Automotive became the first automated vehicle to complete a coast-to-coast journey across North America. In 2015,computer-aided diagnosis and treatment was first launched and is already being tried at 16 cancer institutes working with IBM's Watson Health artificial intelligence venture.
- Asia > India (0.43)
- North America > United States > California > San Francisco County > San Francisco (0.17)
- North America > United States > New York (0.05)
- Automobiles & Trucks (0.89)
- Transportation > Passenger (0.36)
- Transportation > Ground > Road (0.36)
- Information Technology > Services (0.35)
Artificial intelligence segment heats up, here's all you need to know about it
The year 2015 proved to be crucial in the history of artificial intelligence (AI). It was the time when AI, which would so far fall in the category of sci-fi, went mainstream. The late John McCarthy, one of the founders of the discipline, foresaw the autonomous car as long back as the 60s. In April 2015, a car designed by Delphi Automotive became the first automated vehicle to complete a coast-to-coast journey across North America. In 2015,computer-aided diagnosis and treatment was first launched and are already being tried at 16 cancer institutes working with IBM's Watson Health artificial intelligence venture.
- North America > United States > California > San Francisco County > San Francisco (0.17)
- Asia > India (0.07)
- North America > United States > New York (0.05)
- Automobiles & Trucks (0.89)
- Transportation > Passenger (0.36)
- Transportation > Ground > Road (0.36)
- Information Technology > Services (0.35)
Defining the Complexity of an Activity
Sahaf, Yasamin (Washington State University) | Krishnan, Narayanan Chatapuram (Washington State Univeristy) | Cook, Diane J. (Washington State University)
Activity recognition is a widely researched area with applications in health care, security and other domains. With each recognition system considering its own set of activities and sensors, it is difficult to compare the performance of these different systems and more importantly it makes the task of selecting an appropriate set of technologies and tools for recognizing an activity challenging. In this work-in-progress paper we attempt to characterize activities in terms of a complexity measure. We define activity complexity along three dimensions – sensing, computation and performance and illustrate different parameters that parameterize these dimensions. We look at grammars for representing activities and use grammar complexity as a measurement for activity complexity. Then we describe how these measurements can help evaluate the complexity of activities of daily living that are commonly considered by various researchers.
- Information Technology > Sensing and Signal Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)